Evaluation of NUMA Memory Management Through Modeling and Measurements

نویسندگان

  • Richard P. LaRowe
  • Carla Schlatter Ellis
  • Mark A. Holliday
چکیده

The class of NUMA (nonuniform memory access time) shared memory architectures is becoming increasingly important with the desire for larger scale multiprocessors. In such machines, the placement and movement of code and data are crucial to performance. The operating system can play a role in managing placement through the policies and mechanisms of the virtual memory subsystem. In this paper, we explore dynamic page placement policies using two approaches that complement each other in important ways. On one hand, we measure the performance of parallel programs running on the experimental DUnX operating system kernel for the BBN GP1000 which supports a highly parameterized dynamic page placement policy. We also develop and apply an analytic model of memory system performance of a Local/Remote NUMA architecture based on approximate mean-value analysis techniques. The model assumes that a simple workload model based on a few parameters can often provide insight into the general behavior of real applications. The model is validated against experimental data obtained with DUnX while running a synthetic workload. The results of this validation show that in general, model predictions are quite good, though in some cases the model fails to include the eeect of subtle behaviors in the implementation. Experiments investigate the eeectiveness of dynamic page-placement and, in particular, dynamic multiple-copy page placement, the cost of replica-tion/coherency fault errors, and the cost of errors in deciding whether a page should move or be remotely referenced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring rchitectures

Parallel computing performance on scalable share& memory architectures is affected by the structure of the interconnection networks linking processors to memory modules and on the efficiency of the memory/cache management systems. Cache Coherence Nonuniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two effective memory systems, and the hierarchical ring structure is an eff...

متن کامل

Comparative Modeling and Evaluation of CC-NUMA and COMA on Hierarchical Ring Architectures

Parallel computing performance on scalable shared-memory architectures is aaected by the structure of the interconnection networks linking processors to memory modules and on the eeciency of the memory/cache management systems. Cache Coherence Non-Uniform Memory Access (CC-NUMA) and Cache Only Memory Access (COMA) are two eeective memory systems, and the hierarchical ring structure is an eecien...

متن کامل

Performance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor

Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable UniformMemory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to di erent processors, are p...

متن کامل

Performance Evaluation of Memory Allocation Schemes on CC-NUMA Multiprocessors

{ Cache Coherent Non-Uniform Memory Access (CC-NUMA) architectures have received strong interests from both academia and industries. This paper studies the performance impact of design choices at diierent levels of address and memory mapping on CC-NUMA architectures. Through execution-driven simulations of ve numerical programs, we nd close interactions between data allocation, global address t...

متن کامل

Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithread...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Parallel Distrib. Syst.

دوره 3  شماره 

صفحات  -

تاریخ انتشار 1992